Skip to content

GLOWS - fix duplication issue#3175

Merged
laspsandoval merged 1 commit into
IMAP-Science-Operations-Center:devfrom
laspsandoval:glows_histogram_duplicates
May 15, 2026
Merged

GLOWS - fix duplication issue#3175
laspsandoval merged 1 commit into
IMAP-Science-Operations-Center:devfrom
laspsandoval:glows_histogram_duplicates

Conversation

@laspsandoval
Copy link
Copy Markdown
Contributor

This pull request adds logic to deduplicate histogram entries in the generate_histogram_dataset function, ensuring that only the first occurrence of each unique (imap_start_time, imap_time_offset) pair is kept. Additionally, a new test verifies that the deduplication works as intended.

Deduplication logic:

  • Added a deduplication step in generate_histogram_dataset (in glows_l1a.py) that filters out duplicate histogram entries based on imap_start_time and imap_time_offset, logging a warning if any duplicates are found.

Testing:

  • Added a new test test_generate_histogram_dataset_deduplicates (in test_glows_l1a_cdf.py) to ensure the deduplication logic works correctly by asserting that all epochs in the resulting dataset are unique.

@laspsandoval laspsandoval added this to the May 2026 milestone May 11, 2026
@laspsandoval laspsandoval self-assigned this May 11, 2026
@laspsandoval laspsandoval added the Ins: GLOWS Related to the GLOWS instrument label May 11, 2026
@laspsandoval laspsandoval linked an issue May 11, 2026 that may be closed by this pull request
@laspsandoval laspsandoval requested a review from maxinelasp May 11, 2026 16:52
@tech3371 tech3371 requested a review from Copilot May 15, 2026 15:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses duplicate GLOWS L1A histogram records by adding a deduplication step in generate_histogram_dataset, keeping only the first occurrence per (imap_start_time, imap_time_offset) pair and emitting a warning when duplicates are removed. It also adds a regression test intended to validate that the resulting dataset has unique time records.

Changes:

  • Add deduplication of histogram records in generate_histogram_dataset keyed by IMAP start time and time offset.
  • Log a warning when duplicate histogram records are filtered out.
  • Add an external-data-based test intended to verify deduplication behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
imap_processing/glows/l1a/glows_l1a.py Adds deduplication of histogram L1A records before building the xarray dataset.
imap_processing/tests/glows/test_glows_l1a_cdf.py Adds a new external test asserting uniqueness of epochs in the generated histogram dataset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +86 to +97
@pytest.mark.external_test_data
def test_generate_histogram_dataset_deduplicates(in_flight_packet_path):
hist_l0, _ = decom_packets(in_flight_packet_path)
hist_l1a = [HistogramL1A(h) for h in hist_l0]
glows_attrs = create_glows_attr_obj()

dataset = generate_histogram_dataset(hist_l1a, glows_attrs)

epochs = dataset["epoch"].values.tolist()
assert len(epochs) == len(set(epochs))


Comment on lines +341 to +352
# Deduplicate by (imap_start_time, imap_time_offset), keeping the first occurrence.
seen_times: dict = {}
for hist in hist_l1a_list:
key = (
hist.imap_start_time.seconds,
hist.imap_start_time.subseconds,
hist.imap_time_offset.seconds,
hist.imap_time_offset.subseconds,
)
if key not in seen_times:
seen_times[key] = hist
dedup_hists = list(seen_times.values())
Copy link
Copy Markdown
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't enough on the background. That's why I asked Copilot to review it. Since it didn't find any high priority recommendation, this change looks good to me. If you are able, I agree with copilot about updating test to reflect new changes tests.

@laspsandoval laspsandoval merged commit 4ca4f0a into IMAP-Science-Operations-Center:dev May 15, 2026
18 checks passed
@github-project-automation github-project-automation Bot moved this to Done in IMAP May 15, 2026
@laspsandoval laspsandoval deleted the glows_histogram_duplicates branch May 15, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ins: GLOWS Related to the GLOWS instrument

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

BUG - GLOWS L1a: histogram duplicates for some CDFs

3 participants